-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement Block Streamer Bitmap Operations #747
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I have no idea what's going on here, mainly due to unfamiliarity with the algorithm used. It will take some time for me to understand, so rather than block you, I've provided the best review I can with my limited knowledge. I'm trusting that you know what's going on 😄.
If it's possible, it would be good to break up the code up in to more well-defined/cohesive methods, but that doesn't need to happen here.
There are a lot of clippy
errors, could you run cargo clippy
and fix please?
@@ -33,6 +33,7 @@ tonic = "0.10.2" | |||
wildmatch = "2.1.1" | |||
|
|||
registry-types = { path = "../registry/types" } | |||
base64 = "0.22.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you place with the other versioned imports please?
block-streamer/src/bitmap.rs
Outdated
Self {} | ||
} | ||
|
||
pub fn get_bit(&self, byte_array: &[u8], bit_index: usize) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn get_bit(&self, byte_array: &[u8], bit_index: usize) -> bool { | |
pub fn get_bit(&self, bytes: &[u8], bit_index: usize) -> bool { |
Nit: bytes
seems sufficient here?
block-streamer/src/bitmap.rs
Outdated
(byte_array[byte_index] & (1u8 << (7 - bit_index_in_byte))) > 0 | ||
} | ||
|
||
fn set_bit(&self, byte_array: &mut [u8], bit_index: usize, bit_value: bool, write_zero: bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we take both bit_value
and write_zero
? Would bit_value
alone be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because we sometimes want to write the 0 over a 1, when we usually don't want to. Specifically when we are replacing one Elias Gamma encoding over another, as the length might be shorter (leaving extra 1's that should be zero). Technically we don't need it in the current code, but I ported it over as its exactly how we have it in the indexer logic.
block-streamer/src/bitmap.rs
Outdated
} | ||
} | ||
|
||
fn get_number_between_bits( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is number
? Can we be more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm basically we encoded a number as binary and are simply reading the binary value from a particular stretch of bits. Perhaps I can rename this to read_integer_from_binary
? Even though all our functions deal with binary, maybe this can explicitly state this binary is utilized to build an integer.
block-streamer/src/bitmap.rs
Outdated
&self, | ||
byte_array: &[u8], | ||
start_bit_index: usize, | ||
) -> anyhow::Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> anyhow::Result<usize> { | |
) -> Option<usize> { |
Option
seems more idiomatic here
block-streamer/src/bitmap.rs
Outdated
return EliasGammaDecoded { | ||
value: 0, | ||
last_bit_index: 0, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return EliasGammaDecoded { | |
value: 0, | |
last_bit_index: 0, | |
}; | |
return EliasGammaDecoded::default() |
Could use Default
here, but you'll need to derive
it on EliasGammaDecoded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea actually. It would definitely make the match look nicer too. It seems the default for usize is 0 anyway.
block-streamer/src/bitmap.rs
Outdated
fn decompress_bitmap(&self, compressed_bitmap: &[u8]) -> Vec<u8> { | ||
let compressed_bit_length: usize = compressed_bitmap.len() * 8; | ||
let mut current_bit_value: bool = (compressed_bitmap[0] & 0b10000000) > 0; | ||
let mut decompressed_byte_array: Vec<u8> = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know the length of this upfront? Vec::with_capacity()
would avoid unnecessary re-allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we knew capacity, maybe we could just use &[u8]
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know the size upfront. We need to decompress the EG to know how long the bit sequence is for each EG, and we can have many of them. We do know the upper bound, which is 86000 bits, since 1 bit per block and 86000 seconds in a day. But, I felt it was unnecessary to create 12KB byte arrays every time as we usually don't need that many.
block-streamer/src/bitmap.rs
Outdated
decompressed_byte_array | ||
} | ||
|
||
fn merge_compressed_bitmap_into_base_bitmap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between base_bitmap
and compressed_bitmap
? Maybe this would be more obvious if we just had a merge
function, and called decompress
from the outside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge
could even be defined on the Bitmap
struct instead for further clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the confusion is that it is doing two different things. Decompression, and merging. Before decompression, it matters which bitmap is the compressed one as we want to ensure bits re written to the decompressed one. But if the bitmaps are both decompressed, this is no longer an issue.
I think the better way to go forward is creating a merge_bitmap
function like you mentioned but keep it in BitmapOperator. Then we do a three step sequence in the public get_merged_bitmap
function: decode, decompress, merge. This I think would be clear while retaining BitmapOperator as a stateless utility class. I'm a little confused with how a Bitmap struct function would perform merge. I imagine it would need to have a BitmapOperator internally. I think it might make things confusing regarding who owns these data operator functions.
If there's a more clear way to structure this class, I'm happy to rework it when you're back!
block-streamer/src/bitmap.rs
Outdated
Ok(()) | ||
} | ||
|
||
pub fn get_merged_bitmap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn get_merged_bitmap( | |
pub fn merge_bitmaps( |
Nit: this seems more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I was originally thinking of merge_compressed_bitmaps
but maybe its not worth requiring someone to know they're compressed before calling the function? Especially since the argument type is Base64Bitmap which should only really be received form the graphQL query.
block-streamer/src/bitmap.rs
Outdated
use super::*; | ||
|
||
#[test] | ||
fn test_getting_bit_from_array() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn test_getting_bit_from_array() { | |
fn getting_bit_from_array() { |
Nit: test_
seems superfluous here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I'll reword the test names.
The bitmap indexer will return a list of bitmaps in the form of base 64 strings, and associated start block heights. We need a way to convert all that data into a single block height and an associated bitmap.
This PR introduces a new BitmapOperator class which holds all the operations necessary to perform the function of returning a combined binary bitmap with the lowest start block height as index 0.